class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 2 ### Kyle Butts ### 21 July 2021 --- exclude: true --- class: clear, middle <!-- Custom css --> <style type="text/css"> @import url(https://fonts.googleapis.com/css?family=Zilla+Slab:300,300i,400,400i,500,500i,700,700i); /* Create a highlighted class called 'hi' */ .hi { font-weight: 600; } .bw { background-color: rgb(0, 0, 0); color: #ffffff; } .gw { background-color: #d2d2d2; color: #ffffff; } /* Font styling */ .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } .remark-slide-content { background-color: #ffffff; font-size: 24px; /* font-weight: 300; */ /* line-height: 1.5; */ /* padding: 1em 2em 1em 2em; */ } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 20px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 20px; } .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } .remark-code { font-size: 68%; } .inverse > h2 { color: #e64173; font-weight: 300; font-size: 40px; font-style: italic; margin-top: -25px; } .title-slide > h2 { margin-top: -25px; padding-bottom: -20px; color: rgba(249, 38, 114, 0.75); text-shadow: none; font-weight: 300; font-size: 35px; font-style: normal; text-align: left; margin-left: 15px; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #cccccc !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Extra left padding */ .pad-left { margin-left: 5%; } /* Extra left padding */ .big-left { margin-left: 15%; margin-bottom: -0.4em; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> `\(\require{color}\definecolor{red_pink}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863}\)` `\(\require{color}\definecolor{turquoise}{rgb}{0.125490196078431, 0.698039215686274, 0.666666666666667}\)` `\(\require{color}\definecolor{orange}{rgb}{1, 0.647058823529412, 0}\)` `\(\require{color}\definecolor{red}{rgb}{0.984313725490196, 0.380392156862745, 0.0274509803921569}\)` `\(\require{color}\definecolor{blue}{rgb}{0.231372549019608, 0.231372549019608, 0.603921568627451}\)` `\(\require{color}\definecolor{green}{rgb}{0.545098039215686, 0.694117647058824, 0.454901960784314}\)` `\(\require{color}\definecolor{grey_light}{rgb}{0.701960784313725, 0.701960784313725, 0.701960784313725}\)` `\(\require{color}\definecolor{grey_mid}{rgb}{0.498039215686275, 0.498039215686275, 0.498039215686275}\)` `\(\require{color}\definecolor{grey_dark}{rgb}{0.2, 0.2, 0.2}\)` `\(\require{color}\definecolor{purple}{rgb}{0.415686274509804, 0.352941176470588, 0.803921568627451}\)` `\(\require{color}\definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627}\)` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { red_pink: ["{\color{red_pink}{#1}}", 1], turquoise: ["{\color{turquoise}{#1}}", 1], orange: ["{\color{orange}{#1}}", 1], red: ["{\color{red}{#1}}", 1], blue: ["{\color{blue}{#1}}", 1], green: ["{\color{green}{#1}}", 1], grey_light: ["{\color{grey_light}{#1}}", 1], grey_mid: ["{\color{grey_mid}{#1}}", 1], grey_dark: ["{\color{grey_dark}{#1}}", 1], purple: ["{\color{purple}{#1}}", 1], slate: ["{\color{slate}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .red_pink {color: #E64173;} .turquoise {color: #20B2AA;} .orange {color: #FFA500;} .red {color: #FB6107;} .blue {color: #3B3B9A;} .green {color: #8BB174;} .grey_light {color: #B3B3B3;} .grey_mid {color: #7F7F7F;} .grey_dark {color: #333333;} .purple {color: #6A5ACD;} .slate {color: #314F4F;} </style> ### Chapter 2: Describing Distribution with Numbers --- ## Chapter Overview - Population vs. Sample - Measures of Central Tendency - Mean - Median - Measures of Variability - Quartiles - Variance \& Standard Deviation --- # Population vs Sample - .hi.purple[Population]: the entire entities under the study - Examples: all men, all NBA players, all children under 5 - .hi.green[Sample]: subset of the population - Can be used to draw inferences about the population - Examples: our class, Denver Nuggets players, daycares in Colorado - Interested in parameters of the .hi.purple[population] distribution, we can estimate these parameters using data from .hi.green[samples] since finding population parameters is infeasible --- # Population Distribution The following graph depicts the underlying population distribution - We are interested in its parameters, but are unable collect data on every single observation <img src="data:image/png;base64,#ch2_files/figure-html/population-density-1.svg" width="60%" style="display: block; margin: auto;" /> --- # Population Inference What we do instead is use a sample of the population and use that sample distribution to determine parameters of interest .center[ <img style="width:80%;" src="data:image/png;base64,#sample_anim.gif"/> ] --- # Parameters of Interest Two primary .hi.purple[population] parameters of interest: - Measures of central tendency: - Population .orange[mean], `\(\mu\)` - Population .red_pink[median] - Measures of variability: - Population .blue[variance], `\(\sigma^2\)` We will .it.green[estimate] these using the .hi.green[sample] distribution --- # Measuring Center: the Mean The most common measure of center is the arithmetic average, or .hi.orange[mean] `$${\color{orange} \bar{x}} = \frac{x_1 + x_2 + .... + x_n}{n}$$` or more compactly: `$${\color{orange} \bar{x}}=\frac{1}{n}\sum_{i=1}^n x_i$$` --- # Population Inference: Mean .center[ <img style="width:80%;" src="data:image/png;base64,#sample_anim_mean.gif"/> ] --- # Measuring Center: the Median The .hi.red_pink[median] is the midpoint of a distribution - Is more resistant to the influence of .hi[extreme observations] How to calculate median: - Arrange observations from smallest to largest - If there is odd number of observations, the median is the center observation. If there are even number of observations, the median is the average of two center observations --- # Mean vs. Median - Although we will primarily be using the mean throughout the semester, the biggest drawback of the mean is that it is not resistant to .hi.purple[outliers] - The median, however, is resistant to .hi.purple[outliers] so it can be important to calculate for smaller samples .center[ <img style="width: 60%;" src="data:image/png;base64,#meme.png"/> ] --- # Mean vs. Median Example <img src="data:image/png;base64,#ch2_files/figure-html/rodman-graph-1.svg" width="90%" style="display: block; margin: auto;" /> .hi[Median]: 205.5 rebounds and .hi[Mean]: 250.5 rebounds --- # Clicker Question What is the sample average of the participants?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
<ol type="a"> <li>58</li> <li>51.2</li> <li>52</li> <li>49.7</li> </ol> --- # Clicker Question Which measure of central tendency best describes the age of participants?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
<ol type="a"> <li>Median</li> <li>Mean</li> </ol> --- # Measuring Variability Measures of central tendency do not tell the whole story. To further characterize the distribution, we need to know how the data is spread out - Quartiles - Variance --- # Variability: Quartiles - Measure of center alone can be misleading - How to calculate quartiles: - Arrange observations in increasing order and locate .hi.red_pink[median] - The .hi.green[first quartile] is the median of the observations located to the left of the median - The .hi.green[third quartile] is the median of observations located to the right of the median <img src="data:image/png;base64,#quartiles.png" width="50%" style="display: block; margin: auto;" /> --- # Boxplots .hi.blue[five-number summary]: smallest observation (minimum), the first quartile, the median, the third quartile, and the largest observation (maximum) We can use the .hi.purple[boxplot] using this five number summary to display quantitative data - How to make a boxplot: - A central box spans the first and third quartiles - A line in the box marks the median - Line extends from the box out to the smallest and largest observations --- # Boxplots <img src="data:image/png;base64,#ch2_files/figure-html/rodman-box-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Interquartile Range The .hi.turquoise[interquartile range], IQR, is the distance between the first and third quartiles - IQR = `\(Q_3 - Q_1\)` - The IQR measures the spread of the data and it also helps to identify outliers Rule for outliers: - An observation is an outlier if it falls more than `\(1.5 \times IQR\)` above the third quartile or below the first --- # Variability: Variance .hi.purple[Variance]: denoted, `\(s^2\)`, measures how "spread out" the data are on average `$$s^2 = \frac{(x_1-{\color{orange}\bar{x}})^2 + (x_2-{\color{orange}\bar{x}})^2 + .... + (x_n - {\color{orange}\bar{x}})^2}{n-1},$$` or more compactly $$ s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - {\color{orange}\bar{x}})^2 $$ .hi.purple[Standard deviation]: looks at how far each observation is from the mean; square root of the variance `$$s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2}$$` --- # Visualizing Standard Deviation <img src="data:image/png;base64,#ch2_files/figure-html/multiple-vars-1.svg" width="90%" style="display: block; margin: auto;" /> --- # Practice Question Calculate the standard deviation of age?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
--- # Properties of Standard Deviation, `\(s\)` - `\(n-1\)` is referred to as the degrees of freedom - `\(s\)` measures variability about the mean - `\(s\)` is always greater than or equal to zero, but usually `\(> 0\)` - When would it be `\(=0\)`? - As observations become more variable, `\(s\)` gets larger - `\(s\)` is not resistant in the same way the sample mean is not resistant; a few outliers can change it a lot. --- # Summary of Summary Statistics Two basic ways to summarize the center and spread of a distribution - Mean and standard deviation (or variance) - The five-number summary .hi.slate[When to Use Which] Use `\(\bar{x}\)` and `\(s\)` when the distribution is reasonably symmetric and free of outliers Use five-number summary if distribution is skewed, or has outliers --- # Greek Letters and Statistics .pull-left[ .hi.purple[Greek Letters] - Greek letters like `\(\mu\)` and `\(\sigma^2\)` represent the truth about the population. ] .pull-right[ .hi.green[Latin Letters] - Latin lettes like `\(\bar{x}\)` and `\(s^2\)` are calculations that represent guesses (estimates) at the population values. ] The goal for the class is for the latin letters to be good guesses for the greek letters: $$ {\color{green}\text{Data}} \longrightarrow {\color{green}\text{Calculation}} \longrightarrow {\color{green}\text{Estimates}} \longrightarrow^{hopefully!} {\color{purple}\text{Truth}} $$ For example, $$ {\color{green}X} \longrightarrow {\color{green} \frac{1}{n} \sum_{i=1}^n X_i} \longrightarrow {\color{green}\bar{x}} \longrightarrow^{hopefullly!} {\color{purple}\mu} $$ --- # Install R and R Studio .hi[Download R:] [https://www.r-project.org/](https://www.r-project.org/) - Click "download R" link under "Getting Started" - Select a CRAN location (mirror site) and click link - I selected the UC Berkeley one, pick one in USA - Click on "Download R for Mac/Windows/etc" link at top of page - Click on package to download, under "Latest Release" - Save the .pkg file, double click open, and follow instructions .hi[Download RStudio:] [https://www.rstudio.com/](https://www.rstudio.com/) - \url{www.rstudio.com} and click "Download RStudio" - Click on "download RStudio Desktop" --- # How to use R <img src="data:image/png;base64,#r.png" width="90%" style="display: block; margin: auto;" />